Towards a New Standard Arabic Test Collection for Mono- and Cross-Language Information Retrieval

نویسندگان

  • Oussama Ben Khiroun
  • Raja Ayed
  • Bilel Elayeb
  • Ibrahim Bounhas
  • Narjès Bellamine Ben Saoud
  • Fabrice Evrard
چکیده

We propose in this paper a new standard Arabic test collection for monoand cross-language Information Retrieval (CLIR). To do this, we exploit the “Hadith” texts and we provide a portal for sampling and evaluation of Hadiths’ results listed in both Arabic and English versions. The new called “Kunuz” standard Arabic test collection will promote and restart the development of Arabic mono retrieval and CLIR systems blocked since the earlier TREC-2001 and TREC-2002 editions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Arabic Retrieval from English or French Queries: The TREC-2001 Cross-Language Information Retrieval Track

The Cross-language information retrieval track at the 2001 Text Retrieval Conference (TREC-2001) produced the first large information retrieval test collection for Arabic. The collection contains 383,872 Arabic news stories, 25 topic descriptions in Arabic, English and French from which queries can be formed, and manual (ground truth) relevance judgments for a useful subset of the topic-documen...

متن کامل

The TREC 2002 Arabic/English CLIR Track

Nine teams participated in the TREC-2002 cross-language information retrieval track, which focused on retrieving Arabic language documents based on 50 topics that were originally prepared in English. Arabic translations of the topic descriptions were also made available to facilitate monolingual Arabic runs. This was the second year in which a large Arabic document collection was available. Thr...

متن کامل

Semantic Interoperability among Thesauri: A Challenge in the Multicultural Legal Domain

In the last few years crucial issues like cross-language legal information retrieval, document classification, legal knowledge discovery and extraction have been considered in theory and in practice. The availability of services allowing cross-language and cross-collection retrieval is a growing necessity. This paper focuses on the need to develop solutions for automatic, language-independent p...

متن کامل

Arabic Information Retrieval at UMass in TREC-10

The University of Massachusetts took on the TREC10 cross-language track with no prior experience with Arabic, and no Arabic speakers among any of our researchers or students. We intended to implement some standard approaches, and to extend a language modeling approach to handle co-occurrences. Given the lack of resources – training data, electronic bilingual dictionaries, and stemmers, and our ...

متن کامل

IIT at TREC 2002 Linear Combinations Based on Document Structure and Varied Stemming for Arabic Retrieval

For TREC 10 we participated in the Named Page Finding Task and the Cross-Lingual Task. In the web track, we explored the use of linear combinations of term collections based on document structure. Our goal was to examine the effects of different term collection statistics based on document structure in respect to known item retrieval. We parsed documents into structural components and built spe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014